Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora
نویسندگان
چکیده
Within the framework of translation knowledge acquisition from WWW news sites, this paper studies issues on the effect of cross-language retrieval of relevant texts in bilingual lexicon acquisition from comparable corpora. We experimentally show that it is quite effective to reduce the candidate bilingual term pairs against which bilingual term correspondences are estimated, in terms of both computational complexity and the performance of precise estimation of bilingual term correspondences.
منابع مشابه
Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval
The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, phrasal translation as well as evaluations on Cross-Language Information Retrieval. A two-stages translation model is proposed for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives according to their...
متن کاملLearning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach
Recent years saw an increased interest in the use and the construction of large corpora. With this increased interest and awareness has come an expansion in the application to knowledge acquisition and bilingual terminology extraction. The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, combination to linguisticsbased pruning a...
متن کاملSentence Alignment in Parallel, Comparable, and Quasi-comparable Corpora
We explore the usability of different bilingual corpora for the purpose of multilingual and cross-lingual natural language processing. The usability of bilingual corpus is evaluated by the lexical alignment score calculated for the bi-lexicon pair distributed in the aligned bilingual sentence pairs. We compare and contrast a number of bilingual corpora, ranging from parallel, to comparable, and...
متن کاملXRCE Participation in CLEF 2002
In this paper, we describe the methods we used for the Cross-Lingual Evaluation Forum CLEF 2002, and more specifically for the GIRT Task. The methods are based on (1) the extraction of two bilingual lexicons, one from parallel corpora and the other one from comparable corpora, (2) the optimal combination of these bilingual lexicons in Cross-Language Information Retrieval and (3) the combination...
متن کاملAutomatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval
OBJECTIVES We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific domains. MATERIAL AND METHODS We propose firstly a method for enriching multilingual thesauri which extracts new terms from parallel corpora, and seco...
متن کامل